RP-Filter: A Path-Based Triple Filtering Method for Efficient SPARQL Query Processing

نویسندگان

  • Kisung Kim
  • Bongki Moon
  • Hyoung-Joo Kim
چکیده

With the rapid increase of RDF data, the SPARQL query processing has received much attention. Currently, most RDF databases store RDF data in a relational table called triple table and carry out several join operations on the triple tables for SPARQL query processing. However, the execution plans with many joins might be inefficient due to a large amount of intermediate data being passed between join operations. In this paper, we propose a triple filtering method called RP-Filter to reduce the amount of intermediate data. RP-Filter exploits the path information in the query graphs and filters the triples which would not be included in final results in advance of joins. We also suggest an efficient relational operator RFLT which filters triples by means of RP-Filter. Experimental results on synthetic and real-life RDF data show that RP-Filter can reduce the intermediate results effectively and accelerate the SPARQL query processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SwarmGuide: Towards Multiple-Query Optimization in Graph Databases

Preliminaries. A graph database G is a finite, directed, edge-labeled, multigraph defined by G = 〈N,Σ,E〉, where N is a finite set of nodes (vertices), Σ is a set of labels, E is a set of directed, labeled edges, and E ⊆ N ×Σ×N . A path p in G is defined as a sequence of n0a0n1 · · · nk−1ak−1nk where ni ∈ N , ai ∈ Σ, and 〈ni, ai, ni+1〉 ∈ E for 0 ≤ i ≤ k. We call the sequence of edge labels Σ∗ of...

متن کامل

UPSP: Unique Predicate-based Source Selection for SPARQL Endpoint Federation

Efficient source selection is one of the most important optimization steps in federated SPARQL query processing as it leads to more efficient query execution plan generation. An over-estimation of the data sources will generate extra network traffic by retrieving irrelevant intermediate results. Such intermediate results will be excluded after performing joins between triple patterns. Consequen...

متن کامل

A Tool for Efficiently Processing SPARQL Queries on RDF Quads

We present a tool called RIQ (RDF Indexing on Quads) for efficiently processing SPARQL queries on large RDF datasets containing quads. RIQ’s novel design includes: (a) a vector representation of RDF graphs for efficient indexing, (b) a filtering index for efficiently organizing similar RDF graphs, and (c) a decrease-and-conquer strategy for efficient query processing using the filtering index t...

متن کامل

Substring Filtering for Low-Cost Linked Data Interfaces

Recently, Triple Pattern Fragments (tpfs) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate sparql queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the tpf interface purposely does not support complex constructs such as sparql filters, queries that use them need to be ...

متن کامل

Multidimensional Interfaces for Selecting Data within Ordinal Ranges

Linked Data interfaces exist in many flavours, as evidenced by subject pages, sparql endpoints, triple pattern interfaces, and data dumps. These interfaces are mostly used to retrieve parts of a complete dataset, such parts can for example be defined by ranges in one or more dimensions. Filtering Linked Data by dimensions such as time range, geospatial area, or genomic location, requires the lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011